[Art of Assembly: Chapter Thirteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Thirteen
- 13.3.10 - Blocked File I/O
- 13.3.11 - The Program Segment Prefix
(PSP)
13.3.10 Blocked File I/O
The examples in the previous section suffer from a major drawback, they
are extremely slow. The performance problems with the code above are entirely
due to DOS. Making a DOS call is not, shall we say, the fastest operation
in the world. Calling DOS every time we want to read or write a single character
from/to a file will bring the system to its knees. As it turns out, it doesn't
take (practically) any more time to have DOS read or write two characters
than it does to read or write one character. Since the amount of time we
(usually) spend processing the data is negligible compared to the amount
of time DOS takes to return or write the data, reading two characters at
a time will essentially double the speed of the program. If reading two
characters doubles the processing speed, how about reading four characters?
Sure enough, it almost quadruples the processing speed. Likewise processing
ten characters at a time almost increases the processing speed by an order
of magnitude. Alas, this progression doesn't continue forever. There comes
a point of diminishing returns- when it takes far too much memory to justify
a (very) small improvement in performance (keeping in mind that reading
64K in a single operation requires a 64K memory buffer to hold the data).
A good compromise is 256 or 512 bytes. Reading more data doesn't really
improve the performance much, yet a 256 or 512 byte buffer is easier to
deal with than larger buffers.
Reading data in groups or blocks is called blocked I/O. Blocked I/O is often
one to two orders of magnitude faster than single character I/O, so obviously
you should use blocked I/O whenever possible.
There is one minor drawback to blocked I/O-- it's a little more complex
to program than single character I/O. Consider the example presented in
the section on the DOS read command:
Example: This example opens a file and reads it to the EOF
mov ah, 3dh ;Open the file
mov al, 0 ;Open for reading
lea dx, Filename ;Presume DS points at filename
int 21h ; segment
jc BadOpen
mov FHndl, ax ;Save file handle
LP: mov ah,3fh ;Read data from the file
lea dx, Buffer ;Address of data buffer
mov cx, 1 ;Read one byte
mov bx, FHndl ;Get file handle value
int 21h
jc ReadError
cmp ax, cx ;EOF reached?
jne EOF
mov al, Buffer ;Get character read
putc ;Print it (IOSHELL call)
jmp LP ;Read next byte
EOF: mov bx, FHndl
mov ah, 3eh ;Close file
int 21h
jc CloseError
There isn't much to this program at all. Now consider the same example rewritten
to use blocked I/O:
Example: This example opens a file and reads it to the EOF using blocked
I/O
mov ah, 3dh ;Open the file
mov al, 0 ;Open for reading
lea dx, Filename ;Presume DS points at filename
int 21h ; segment
jc BadOpen
mov FHndl, ax ;Save file handle
LP: mov ah,3fh ;Read data from the file
lea dx, Buffer ;Address of data buffer
mov cx, 256 ;Read 256 bytes
mov bx, FHndl ;Get file handle value
int 21h
jc ReadError
cmp ax, cx ;EOF reached?
jne EOF
mov si, 0 ;Note: CX=256 at this point.
PrtLp: mov al, Buffer[si] ;Get character read
putc ;Print it
inc si
loop PrtLp
jmp LP ;Read next block
; Note, just because the number of bytes read doesn't equal 256,
; don't get the idea we're through, there could be up to 255 bytes
; in the buffer still waiting to be processed.
EOF: mov cx, ax
jcxz EOF2 ;If CX is zero, we're really done.
mov si, 0 ;Process the last block of data read
Finis: mov al, Buffer[si] ; from the file which contains
putc ; 1..255 bytes of valid data.
inc si
loop Finis
EOF2: mov bx, FHndl
mov ah, 3eh ;Close file
int 21h
jc CloseError
This example demonstrates one major hassle with blocked I/O - when you reach
the end of file, you haven't necessarily processed all of the data in the
file. If the block size is 256 and there are 255 bytes left in the file,
DOS will return an EOF condition (the number of bytes read don't match the
request). In this case, we've still got to process the characters that were
read. The code above does this in a rather straight-forward manner, using
a second loop to finish up when the EOF is reached. You've probably noticed
that the two print loops are virtually identical. This program can be reduced
in size somewhat using the following code which is only a little more complex:
Example: This example opens a file and reads it to the EOF using blocked
I/O
mov ah, 3dh ;Open the file
mov al, 0 ;Open for reading
lea dx, Filename ;Presume DS points at filename
int 21h ; segment.
jc BadOpen
mov FHndl, ax ;Save file handle
LP: mov ah,3fh ;Read data from the file
lea dx, Buffer ;Address of data buffer
mov cx, 256 ;Read 256 bytes
mov bx, FHndl ;Get file handle value
int 21h
jc ReadError
mov bx, ax ;Save for later
mov cx, ax
jcxz EOF
mov si, 0 ;Note: CX=256 at this point.
PrtLp: mov al, Buffer[si] ;Get character read
putc ;Print it
inc si
loop PrtLp
cmp bx, 256 ;Reach EOF yet?
je LP
EOF: mov bx, FHndl
mov ah, 3eh ;Close file
int 21h
jc CloseError
Blocked I/O works best on sequential files. That is, those files opened
only for reading or writing (no seeking). When dealing with random access
files, you should read or write whole records at one time using the DOS
read/write commands to process the whole record. This is still considerably
faster than manipulating the data one byte at a time.
13.3.11 The Program Segment Prefix (PSP)
When a program is loaded into memory for execution, DOS first builds
up a program segment prefix immediately before the program is loaded into
memory. This PSP contains lots of information, some of it useful, some of
it obsolete. Understanding the layout of the PSP is essential for programmers
designing assembly language programs.
The PSP is 256 bytes long and contains the following information:
Offset Length Description
0 2 An INT 20h instruction is stored here
2 2 Program ending address
4 1 Unused, reserved by DOS
5 5 Call to DOS function dispatcher
0Ah 4 Address of program termination code
0Eh 4 Address of break handler routine
12h 4 Address of critical error handler routine
16h 22 Reserved for use by DOS
2Ch 2 Segment address of environment area
2Eh 34 Reserved by DOS
50h 3 INT 21h, RETF instructions
53h 9 Reserved by DOS
5Ch 16 Default FCB #1
6Ch 20 Default FCB #2
80h 1 Length of command line string
81h 127 Command line string
Note: locations 80h..FFh are used for the default DTA.
Most of the information in the PSP is of little use to a modern MS-DOS assembly
language program. Buried in the PSP, however, are a couple of gems that
are worth knowing about. Just for completeness, however, we'll take a look
at all of the fields in the PSP.
The first field in the PSP contains an int 20h
instruction.
Int 20h
is an obsolete mechanism used to terminate program
execution. Back in the early days of DOS v1.0, your program would execute
a jmp
to this location in order to terminate. Nowadays, of
course, we have DOS function 4Ch which is much easier (and safer) than jumping
to location zero in the PSP. Therefore, this field is obsolete.
Field number two contains a value which points at the last paragraph allocated
to your program By subtracting the address of the PSP from this value, you
can determine the amount of memory allocated to your program (and quit if
there is insufficient memory available).
The third field is the first of many "holes" left in the PSP by
Microsoft. Why they're here is anyone's guess.
The fourth field is a call to the DOS function dispatcher. The purpose of
this (now obsolete) DOS calling mechanism was to allow some additional compatibility
with CP/M-80 programs. For modern DOS programs, there is absolutely no need
to worry about this field.
The next three fields are used to store special addresses during the execution
of a program. These fields contain the default terminate vector, break vector,
and critical error handler vectors. These are the values normally stored
in the interrupt vectors for int 22h
, int 23h
,
and int 24h
. By storing a copy of the values in the vectors
for these interrupts, you can change these vectors so that they point into
your own code. When your program terminates, DOS restores those three vectors
from these three fields in the PSP. For more details on these interrupt
vectors, please consult the DOS technical reference manual.
The eighth field in the PSP record is another reserved field, currently
unavailable for use by your programs.
The ninth field is another real gem. It's the address of the environment
strings area. This is a two-byte pointer which contains the segment address
of the environment storage area. The environment strings always begin with
an offset zero within this segment. The environment string area consists
of a sequence of zero-terminated strings. It uses the following format:
string1 0 string2 0 string3 0 ... 0 stringn 0 0
That is, the environment area consists of a list of zero terminated strings,
the list itself being terminated by a string of length zero (i.e., a zero
all by itself, or two zeros in a row, however you want to look at it). Strings
are (usually) placed in the environment area via DOS commands like PATH,
SET, etc. Generally, a string in the environment area takes the form
name = parameters
For example, the "SET IPATH=C:\ASSEMBLY\INCLUDE" command copies
the string "IPATH=C:\ASSEMBLY\INCLUDE" into the environment string
storage area.
Many languages scan the environment storage area to find default filename
paths and other pieces of default information set up by DOS. Your programs
can take advantage of this as well.
The next field in the PSP is another block of reserved storage, currently
undefined by DOS.
The 11th field in the PSP is another call to the DOS function dispatcher.
Why this call exists (when the one at location 5 in the PSP already exists
and nobody really uses either mechanism to call DOS) is an interesting question.
In general, this field should be ignored by your programs.
The 12th field is another block of unused bytes in the PSP which should
be ignored.
The 13th and 14th fields in the PSP are the default FCBs (File Control Blocks).
File control blocks are another archaic data structure carried over from
CP/M-80. FCBs are used only with the obsolete DOS v1.0 file handling routines,
so they are of little interest to us. We'll ignore these FCBs in the PSP.
Locations 80h through the end of the PSP contain a very important piece
of information- the command line parameters typed on the DOS command line
along with your program's name. If the following is typed on the DOS command
line:
MYPGM parameter1, parameter2
the following is stored into the command line parameter field:
23, " parameter1, parameter2", 0Dh
Location 80h contains 2310, the length of the parameters following the program
name. Locations 81h through 97h contain the characters making up the parameter
string. Location 98h contains a carriage return. Notice that the carriage
return character is not figured into the length of the command line string.
Processing the command line string is such an important facet of assembly
language programming that this process will be discussed in detail in the
next section.
Locations 80h..FFh in the PSP also comprise the default DTA. Therefore,
if you don't use DOS function 1Ah to change the DTA and you execute a FIND
FIRST FILE, the filename information will be stored starting at location
80h in the PSP.
One important detail we've omitted until now is exactly how you access data
in the PSP. Although the PSP is loaded into memory immediately before your
program, that doesn't necessarily mean that it appears 100h bytes before
your code. Your data segments may have been loaded into memory before your
code segments, thereby invalidating this method of locating the PSP. The
segment address of the PSP is passed to your program in the ds
register.
To store the PSP address away in your data segment, your programs should
begin with the following code:
push ds ;Save PSP value
mov ax, seg DSEG ;Point DS and ES at our data
mov ds, ax ; segment.
mov es, ax
pop PSP ;Store PSP value into "PSP"
; variable.
.
.
.
Another way to obtain the PSP address, in DOS 5.0 and later, is to make
a DOS call. If you load ah
with 51h and execute an int
21h
instruction, MS-DOS will return the segment address of the current
PSP in the bx
register.
There are lots of tricky things you can do with the data in the PSP. Peter
Norton's Programmer's Guide to the IBM PC lists all kinds of tricks. Such
operations won't be discussed here because they're a little beyond the scope
of this manual.
- 13.3.10 - Blocked File I/O
- 13.3.11 - The Program Segment Prefix
(PSP)
Art of Assembly: Chapter Thirteen - 28 SEP 1996
[Chapter Thirteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]